Overview

Dataset statistics

Number of variables13
Number of observations740
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory75.3 KiB
Average record size in memory104.2 B

Variable types

Numeric6
Categorical7

Alerts

df_index is highly correlated with age and 4 other fieldsHigh correlation
age is highly correlated with df_indexHigh correlation
cp is highly correlated with exangHigh correlation
chol is highly correlated with df_index and 1 other fieldsHigh correlation
restecg is highly correlated with df_indexHigh correlation
thalach is highly correlated with exangHigh correlation
exang is highly correlated with cp and 2 other fieldsHigh correlation
oldpeak is highly correlated with exang and 1 other fieldsHigh correlation
num is highly correlated with df_index and 1 other fieldsHigh correlation
dataset is highly correlated with df_index and 1 other fieldsHigh correlation
df_index has unique values Unique
chol has 79 (10.7%) zeros Zeros
oldpeak has 329 (44.5%) zeros Zeros

Reproduction

Analysis started2022-10-17 20:05:27.779083
Analysis finished2022-10-17 20:05:30.803593
Duration3.02 seconds
Software versionpandas-profiling v3.3.0
Download configurationconfig.json

Variables

df_index
Real number (ℝ≥0)

HIGH CORRELATION
UNIQUE

Distinct740
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean419.1108108
Minimum0
Maximum919
Zeros1
Zeros (%)0.1%
Negative0
Negative (%)0.0%
Memory size5.9 KiB
2022-10-17T22:05:30.852892image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile40.95
Q1211.75
median402.5
Q3587.25
95-th percentile860.15
Maximum919
Range919
Interquartile range (IQR)375.5

Descriptive statistics

Standard deviation253.0874974
Coefficient of variation (CV)0.6038677383
Kurtosis-0.9496211708
Mean419.1108108
Median Absolute Deviation (MAD)188
Skewness0.2394276424
Sum310142
Variance64053.28134
MonotonicityStrictly increasing
2022-10-17T22:05:30.931241image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
01
 
0.1%
5301
 
0.1%
5211
 
0.1%
5221
 
0.1%
5231
 
0.1%
5241
 
0.1%
5251
 
0.1%
5261
 
0.1%
5271
 
0.1%
5281
 
0.1%
Other values (730)730
98.6%
ValueCountFrequency (%)
01
0.1%
11
0.1%
31
0.1%
41
0.1%
51
0.1%
61
0.1%
71
0.1%
81
0.1%
91
0.1%
101
0.1%
ValueCountFrequency (%)
9191
0.1%
9171
0.1%
9151
0.1%
9141
0.1%
9131
0.1%
9121
0.1%
9111
0.1%
9101
0.1%
9091
0.1%
9081
0.1%

age
Real number (ℝ≥0)

HIGH CORRELATION

Distinct50
Distinct (%)6.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean53.0972973
Minimum28
Maximum77
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size5.9 KiB
2022-10-17T22:05:31.009394image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum28
5-th percentile37
Q146
median54
Q360
95-th percentile68
Maximum77
Range49
Interquartile range (IQR)14

Descriptive statistics

Standard deviation9.408126694
Coefficient of variation (CV)0.1771865457
Kurtosis-0.4197528623
Mean53.0972973
Median Absolute Deviation (MAD)7
Skewness-0.161467334
Sum39292
Variance88.5128479
MonotonicityNot monotonic
2022-10-17T22:05:31.085915image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
5443
 
5.8%
5838
 
5.1%
5533
 
4.5%
5731
 
4.2%
5630
 
4.1%
5230
 
4.1%
5928
 
3.8%
5125
 
3.4%
5324
 
3.2%
6024
 
3.2%
Other values (40)434
58.6%
ValueCountFrequency (%)
281
 
0.1%
292
 
0.3%
301
 
0.1%
312
 
0.3%
324
 
0.5%
332
 
0.3%
346
0.8%
359
1.2%
365
0.7%
3711
1.5%
ValueCountFrequency (%)
772
 
0.3%
761
 
0.1%
753
 
0.4%
744
0.5%
731
 
0.1%
721
 
0.1%
714
0.5%
707
0.9%
697
0.9%
688
1.1%

sex
Categorical

Distinct2
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size5.9 KiB
1.0
566 
0.0
174 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters2220
Distinct characters3
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1.0
2nd row1.0
3rd row0.0
4th row0.0
5th row0.0

Common Values

ValueCountFrequency (%)
1.0566
76.5%
0.0174
 
23.5%

Length

2022-10-17T22:05:31.152262image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-10-17T22:05:31.210854image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
1.0566
76.5%
0.0174
 
23.5%

Most occurring characters

ValueCountFrequency (%)
0914
41.2%
.740
33.3%
1566
25.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number1480
66.7%
Other Punctuation740
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0914
61.8%
1566
38.2%
Other Punctuation
ValueCountFrequency (%)
.740
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common2220
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0914
41.2%
.740
33.3%
1566
25.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII2220
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0914
41.2%
.740
33.3%
1566
25.5%

cp
Categorical

HIGH CORRELATION

Distinct4
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Memory size5.9 KiB
4.0
392 
3.0
161 
2.0
150 
1.0
 
37

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters2220
Distinct characters6
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2.0
2nd row2.0
3rd row1.0
4th row2.0
5th row2.0

Common Values

ValueCountFrequency (%)
4.0392
53.0%
3.0161
21.8%
2.0150
 
20.3%
1.037
 
5.0%

Length

2022-10-17T22:05:31.261422image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-10-17T22:05:31.324197image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
4.0392
53.0%
3.0161
21.8%
2.0150
 
20.3%
1.037
 
5.0%

Most occurring characters

ValueCountFrequency (%)
.740
33.3%
0740
33.3%
4392
17.7%
3161
 
7.3%
2150
 
6.8%
137
 
1.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number1480
66.7%
Other Punctuation740
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0740
50.0%
4392
26.5%
3161
 
10.9%
2150
 
10.1%
137
 
2.5%
Other Punctuation
ValueCountFrequency (%)
.740
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common2220
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
.740
33.3%
0740
33.3%
4392
17.7%
3161
 
7.3%
2150
 
6.8%
137
 
1.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII2220
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
.740
33.3%
0740
33.3%
4392
17.7%
3161
 
7.3%
2150
 
6.8%
137
 
1.7%

trestbps
Real number (ℝ≥0)

Distinct58
Distinct (%)7.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean132.7540541
Minimum0
Maximum200
Zeros1
Zeros (%)0.1%
Negative0
Negative (%)0.0%
Memory size5.9 KiB
2022-10-17T22:05:31.390561image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile110
Q1120
median130
Q3140
95-th percentile160.2
Maximum200
Range200
Interquartile range (IQR)20

Descriptive statistics

Standard deviation18.58124966
Coefficient of variation (CV)0.1399674743
Kurtosis3.80309551
Mean132.7540541
Median Absolute Deviation (MAD)10
Skewness0.1784665755
Sum98238
Variance345.2628388
MonotonicityNot monotonic
2022-10-17T22:05:31.469746image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
120119
16.1%
130102
13.8%
14086
 
11.6%
15050
 
6.8%
11045
 
6.1%
16043
 
5.8%
12523
 
3.1%
12817
 
2.3%
13515
 
2.0%
13815
 
2.0%
Other values (48)225
30.4%
ValueCountFrequency (%)
01
 
0.1%
921
 
0.1%
942
 
0.3%
961
 
0.1%
981
 
0.1%
10010
1.4%
1011
 
0.1%
1022
 
0.3%
1042
 
0.3%
1055
0.7%
ValueCountFrequency (%)
2003
 
0.4%
1921
 
0.1%
1902
 
0.3%
1851
 
0.1%
18010
1.4%
1783
 
0.4%
1741
 
0.1%
1722
 
0.3%
17012
1.6%
1651
 
0.1%

chol
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct208
Distinct (%)28.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean220.1364865
Minimum0
Maximum603
Zeros79
Zeros (%)10.7%
Negative0
Negative (%)0.0%
Memory size5.9 KiB
2022-10-17T22:05:31.554348image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q1197
median231
Q3271
95-th percentile336.05
Maximum603
Range603
Interquartile range (IQR)74

Descriptive statistics

Standard deviation93.61455549
Coefficient of variation (CV)0.4252568803
Kurtosis1.81688245
Mean220.1364865
Median Absolute Deviation (MAD)37
Skewness-0.8091404209
Sum162901
Variance8763.685
MonotonicityNot monotonic
2022-10-17T22:05:31.881397image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
079
 
10.7%
22010
 
1.4%
25410
 
1.4%
2309
 
1.2%
2239
 
1.2%
2608
 
1.1%
2118
 
1.1%
2468
 
1.1%
2168
 
1.1%
2198
 
1.1%
Other values (198)583
78.8%
ValueCountFrequency (%)
079
10.7%
851
 
0.1%
1002
 
0.3%
1171
 
0.1%
1261
 
0.1%
1291
 
0.1%
1311
 
0.1%
1321
 
0.1%
1411
 
0.1%
1472
 
0.3%
ValueCountFrequency (%)
6031
0.1%
5641
0.1%
5291
0.1%
5181
0.1%
4911
0.1%
4581
0.1%
4171
0.1%
4121
0.1%
4091
0.1%
4071
0.1%

fbs
Categorical

Distinct2
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size5.9 KiB
0.0
629 
1.0
111 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters2220
Distinct characters3
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.0
2nd row0.0
3rd row0.0
4th row0.0
5th row0.0

Common Values

ValueCountFrequency (%)
0.0629
85.0%
1.0111
 
15.0%

Length

2022-10-17T22:05:31.958944image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-10-17T22:05:32.022334image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
0.0629
85.0%
1.0111
 
15.0%

Most occurring characters

ValueCountFrequency (%)
01369
61.7%
.740
33.3%
1111
 
5.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number1480
66.7%
Other Punctuation740
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
01369
92.5%
1111
 
7.5%
Other Punctuation
ValueCountFrequency (%)
.740
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common2220
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
01369
61.7%
.740
33.3%
1111
 
5.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII2220
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
01369
61.7%
.740
33.3%
1111
 
5.0%

restecg
Categorical

HIGH CORRELATION

Distinct3
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size5.9 KiB
0.0
445 
2.0
175 
1.0
120 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters2220
Distinct characters4
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2.0
2nd row0.0
3rd row1.0
4th row1.0
5th row0.0

Common Values

ValueCountFrequency (%)
0.0445
60.1%
2.0175
 
23.6%
1.0120
 
16.2%

Length

2022-10-17T22:05:32.075701image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-10-17T22:05:32.137090image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
0.0445
60.1%
2.0175
 
23.6%
1.0120
 
16.2%

Most occurring characters

ValueCountFrequency (%)
01185
53.4%
.740
33.3%
2175
 
7.9%
1120
 
5.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number1480
66.7%
Other Punctuation740
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
01185
80.1%
2175
 
11.8%
1120
 
8.1%
Other Punctuation
ValueCountFrequency (%)
.740
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common2220
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
01185
53.4%
.740
33.3%
2175
 
7.9%
1120
 
5.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII2220
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
01185
53.4%
.740
33.3%
2175
 
7.9%
1120
 
5.4%

thalach
Real number (ℝ≥0)

HIGH CORRELATION

Distinct115
Distinct (%)15.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean138.7445946
Minimum60
Maximum202
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size5.9 KiB
2022-10-17T22:05:32.200023image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum60
5-th percentile96
Q1120
median140
Q3159.25
95-th percentile179
Maximum202
Range142
Interquartile range (IQR)39.25

Descriptive statistics

Standard deviation25.84608152
Coefficient of variation (CV)0.1862853223
Kurtosis-0.5418561084
Mean138.7445946
Median Absolute Deviation (MAD)20
Skewness-0.2298360761
Sum102671
Variance668.0199301
MonotonicityNot monotonic
2022-10-17T22:05:32.275234image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
15038
 
5.1%
14037
 
5.0%
12026
 
3.5%
13026
 
3.5%
16022
 
3.0%
17018
 
2.4%
12518
 
2.4%
11015
 
2.0%
11514
 
1.9%
14214
 
1.9%
Other values (105)512
69.2%
ValueCountFrequency (%)
601
 
0.1%
631
 
0.1%
691
 
0.1%
711
 
0.1%
721
 
0.1%
731
 
0.1%
802
0.3%
821
 
0.1%
831
 
0.1%
843
0.4%
ValueCountFrequency (%)
2021
 
0.1%
1951
 
0.1%
1941
 
0.1%
1921
 
0.1%
1902
0.3%
1881
 
0.1%
1871
 
0.1%
1862
0.3%
1854
0.5%
1844
0.5%

exang
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size5.9 KiB
0.0
444 
1.0
296 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters2220
Distinct characters3
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.0
2nd row0.0
3rd row0.0
4th row0.0
5th row0.0

Common Values

ValueCountFrequency (%)
0.0444
60.0%
1.0296
40.0%

Length

2022-10-17T22:05:32.346665image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-10-17T22:05:32.405576image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
0.0444
60.0%
1.0296
40.0%

Most occurring characters

ValueCountFrequency (%)
01184
53.3%
.740
33.3%
1296
 
13.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number1480
66.7%
Other Punctuation740
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
01184
80.0%
1296
 
20.0%
Other Punctuation
ValueCountFrequency (%)
.740
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common2220
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
01184
53.3%
.740
33.3%
1296
 
13.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII2220
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
01184
53.3%
.740
33.3%
1296
 
13.3%

oldpeak
Real number (ℝ)

HIGH CORRELATION
ZEROS

Distinct44
Distinct (%)5.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.8943243243
Minimum-1
Maximum6.2
Zeros329
Zeros (%)44.5%
Negative2
Negative (%)0.3%
Memory size5.9 KiB
2022-10-17T22:05:32.464247image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum-1
5-th percentile0
Q10
median0.5
Q31.5
95-th percentile3
Maximum6.2
Range7.2
Interquartile range (IQR)1.5

Descriptive statistics

Standard deviation1.08715975
Coefficient of variation (CV)1.215621359
Kurtosis1.248792798
Mean0.8943243243
Median Absolute Deviation (MAD)0.5
Skewness1.206874625
Sum661.8
Variance1.181916322
MonotonicityNot monotonic
2022-10-17T22:05:32.537715image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=44)
ValueCountFrequency (%)
0329
44.5%
173
 
9.9%
264
 
8.6%
1.538
 
5.1%
326
 
3.5%
0.518
 
2.4%
1.217
 
2.3%
0.815
 
2.0%
0.614
 
1.9%
1.413
 
1.8%
Other values (34)133
18.0%
ValueCountFrequency (%)
-11
 
0.1%
-0.51
 
0.1%
0329
44.5%
0.17
 
0.9%
0.212
 
1.6%
0.34
 
0.5%
0.49
 
1.2%
0.518
 
2.4%
0.614
 
1.9%
0.71
 
0.1%
ValueCountFrequency (%)
6.21
 
0.1%
5.61
 
0.1%
51
 
0.1%
4.41
 
0.1%
4.22
 
0.3%
48
1.1%
3.81
 
0.1%
3.64
0.5%
3.51
 
0.1%
3.43
 
0.4%

num
Categorical

HIGH CORRELATION

Distinct5
Distinct (%)0.7%
Missing0
Missing (%)0.0%
Memory size5.9 KiB
0
357 
1
204 
2
79 
3
78 
4
 
22

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters740
Distinct characters5
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0357
48.2%
1204
27.6%
279
 
10.7%
378
 
10.5%
422
 
3.0%

Length

2022-10-17T22:05:32.608316image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-10-17T22:05:32.673203image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
0357
48.2%
1204
27.6%
279
 
10.7%
378
 
10.5%
422
 
3.0%

Most occurring characters

ValueCountFrequency (%)
0357
48.2%
1204
27.6%
279
 
10.7%
378
 
10.5%
422
 
3.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number740
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0357
48.2%
1204
27.6%
279
 
10.7%
378
 
10.5%
422
 
3.0%

Most occurring scripts

ValueCountFrequency (%)
Common740
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0357
48.2%
1204
27.6%
279
 
10.7%
378
 
10.5%
422
 
3.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII740
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0357
48.2%
1204
27.6%
279
 
10.7%
378
 
10.5%
422
 
3.0%

dataset
Categorical

HIGH CORRELATION

Distinct4
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Memory size5.9 KiB
cleveland
303 
hungarian
261 
va
130 
switzerland
46 

Length

Max length11
Median length9
Mean length7.894594595
Min length2

Characters and Unicode

Total characters5842
Distinct characters16
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowhungarian
2nd rowhungarian
3rd rowhungarian
4th rowhungarian
5th rowhungarian

Common Values

ValueCountFrequency (%)
cleveland303
40.9%
hungarian261
35.3%
va130
17.6%
switzerland46
 
6.2%

Length

2022-10-17T22:05:32.731959image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-10-17T22:05:32.794322image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
cleveland303
40.9%
hungarian261
35.3%
va130
17.6%
switzerland46
 
6.2%

Most occurring characters

ValueCountFrequency (%)
a1001
17.1%
n871
14.9%
l652
11.2%
e652
11.2%
v433
7.4%
d349
 
6.0%
r307
 
5.3%
i307
 
5.3%
c303
 
5.2%
h261
 
4.5%
Other values (6)706
12.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter5842
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a1001
17.1%
n871
14.9%
l652
11.2%
e652
11.2%
v433
7.4%
d349
 
6.0%
r307
 
5.3%
i307
 
5.3%
c303
 
5.2%
h261
 
4.5%
Other values (6)706
12.1%

Most occurring scripts

ValueCountFrequency (%)
Latin5842
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
a1001
17.1%
n871
14.9%
l652
11.2%
e652
11.2%
v433
7.4%
d349
 
6.0%
r307
 
5.3%
i307
 
5.3%
c303
 
5.2%
h261
 
4.5%
Other values (6)706
12.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII5842
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a1001
17.1%
n871
14.9%
l652
11.2%
e652
11.2%
v433
7.4%
d349
 
6.0%
r307
 
5.3%
i307
 
5.3%
c303
 
5.2%
h261
 
4.5%
Other values (6)706
12.1%

Interactions

2022-10-17T22:05:30.170551image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-17T22:05:28.093482image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-17T22:05:28.551219image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-17T22:05:28.941236image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-17T22:05:29.361533image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-17T22:05:29.763870image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-17T22:05:30.233772image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-17T22:05:28.214145image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-17T22:05:28.611006image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-17T22:05:29.007712image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-17T22:05:29.428664image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-17T22:05:29.828414image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-17T22:05:30.296491image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-17T22:05:28.275884image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-17T22:05:28.672718image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-17T22:05:29.076984image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-17T22:05:29.492859image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-17T22:05:29.892035image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-17T22:05:30.368288image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-17T22:05:28.343285image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-17T22:05:28.739056image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-17T22:05:29.147932image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-17T22:05:29.562769image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-17T22:05:29.963999image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-17T22:05:30.436128image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-17T22:05:28.412771image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-17T22:05:28.801313image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-17T22:05:29.218154image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-17T22:05:29.628466image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-17T22:05:30.031019image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-17T22:05:30.506234image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-17T22:05:28.482430image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-17T22:05:28.871049image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-17T22:05:29.289605image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-17T22:05:29.696290image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-10-17T22:05:30.101511image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Correlations

2022-10-17T22:05:32.855646image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-10-17T22:05:32.962176image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-10-17T22:05:33.058992image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-10-17T22:05:33.151449image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2022-10-17T22:05:33.232721image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-10-17T22:05:30.617287image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
A simple visualization of nullity by column.
2022-10-17T22:05:30.755908image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

df_indexagesexcptrestbpscholfbsrestecgthalachexangoldpeaknumdataset
0028.01.02.0130.0132.00.02.0185.00.00.00hungarian
1129.01.02.0120.0243.00.00.0160.00.00.00hungarian
2330.00.01.0170.0237.00.01.0170.00.00.00hungarian
3431.00.02.0100.0219.00.01.0150.00.00.00hungarian
4532.00.02.0105.0198.00.00.0165.00.00.00hungarian
5632.01.02.0110.0225.00.00.0184.00.00.00hungarian
6732.01.02.0125.0254.00.00.0155.00.00.00hungarian
7833.01.03.0120.0298.00.00.0185.00.00.00hungarian
8934.00.02.0130.0161.00.00.0190.00.00.00hungarian
91034.01.02.0150.0214.00.01.0168.00.00.00hungarian

Last rows

df_indexagesexcptrestbpscholfbsrestecgthalachexangoldpeaknumdataset
73090874.01.04.0155.0310.00.00.0112.01.01.52va
73190968.01.03.0134.0254.01.00.0151.01.00.00va
73291051.00.04.0114.0258.01.02.096.00.01.00va
73391162.01.04.0160.0254.01.01.0108.01.03.04va
73491253.01.04.0144.0300.01.01.0128.01.01.53va
73591362.01.04.0158.0170.00.01.0138.01.00.01va
73691446.01.04.0134.0310.00.00.0126.00.00.02va
73791554.00.04.0127.0333.01.01.0154.00.00.01va
73891755.01.04.0122.0223.01.01.0100.00.00.02va
73991962.01.02.0120.0254.00.02.093.01.00.01va